< cd .. (返回中枢)
ID: AUDIT-001 | 时空坐标: 2026-02-21

Triton SM90: AxisInfoAnalysis 维度塌缩与 C++ 底层越界诊断

TritonSM90Compiler ICE

@Qubitium @colesbury I’ve conducted a logical audit on this Segmentation Fault. The crash during importlib.external.create_module is a definitive Thread-Safety Violation triggered by the impedance mismatch between Triton’s legacy C-extension architecture and Python 3.13t’s free-threading (no-GIL) runtime.

1. The Root Cause: Race Condition in Module Initialization

In standard Python, the Import Lock (GIL) serializes the loading of C-extensions. Under 3.13t, this protection is gone:

2. Lack of Multi-phase Initialization (PEP 489) Support

Triton’s C-extension has likely not yet adopted Multi-phase Initialization and has not been marked as supporting free-threading (via Py_MOD_GIL_NOT_USED). Consequently, the internal memory allocators (like mimalloc in Python 3.13t) and the CUDA Driver API may experience conflicts when Triton attempts to bridge its un-isolated C++ state with the new lock-free interpreter state.


Temporary Workaround

Until Triton formally refactors its C-extension to be thread-isolated (removing global static pointers in driver.py), you must force the interpreter to restore the GIL:

PYTHON_GIL=1 CUDA_VISIBLE_DEVICES=7 pytest test_mimo.py

If the code executes without error under PYTHON_GIL=1, it confirms that the Segfault is strictly a consequence of Un-isolated Concurrency within the Triton C-API layer.

Verdict: This is an upstream compatibility gap in Triton’s backend rather than a bug in GPTQModel.